首页> 外文OA文献 >How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

【2h】

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

机译：如何评价你的对话系统：对中国的实证研究对话响应生成的无监督评估指标

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We investigate evaluation metrics for dialogue response generation systemswhere supervised labels, such as task completion, are not available. Recentworks in response generation have adopted metrics from machine translation tocompare a model's generated response to a single target response. We show thatthese metrics correlate very weakly with human judgements in the non-technicalTwitter domain, and not at all in the technical Ubuntu domain. We providequantitative and qualitative results highlighting specific weaknesses inexisting metrics, and provide recommendations for future development of betterautomatic evaluation metrics for dialogue systems.

机译：我们调查对话监督生成系统的评估指标，其中没有监督标签，例如任务完成。响应生成中的最近工作采用了机器翻译中的度量，以将模型的生成响应与单个目标响应进行比较。我们表明，这些指标与非技术性Twitter领域中的人为判断之间的关联非常弱，而在技术性Ubuntu领域中则根本没有。我们提供定量和定性的结果，突出显示现有指标的特定弱点，并为未来开发更好的自动评估系统的对话系统提供建议。

著录项

作者
Liu, Chia-Wei; Lowe, Ryan; Serban, Iulian V.; Noseworthy, Michael; Charlin, Laurent; Pineau, Joelle;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. PONE: A Novel Automatic Evaluation Metric for Open-domain Generative Dialogue Systems [J] . Lan Tian, Mao Xian-Ling, Wei Wei, ACM Transactions on Information Systems . 2021,第1期

机译：POE：开放式生成对话系统的新型自动评估度量
2. Dialogue on Dialogues -- Multidisciplinary Evaluation of Advanced Speech-Based Interactive Systems: A Report on the Interspeech 2006 Satellite Event [J] . James A. Larson, Kristiina Jokinen, Michael McTear AI magazine . 2007,第2期

机译：对话中的对话-先进的基于语音的交互式系统的多学科评估：2006年国际演讲大会卫星事件的报告
3. Dialogue on Dialogues-Multidisci-plinary Evaluation of Advanced Speech-Based Interactive Systems: A Report on the Interspeech 2006 Satellite Event [J] . Kristiina Jokinen, Michael McTear, James A. Larson AI Magazine . 2007,第2期

机译：基于高级语音交互系统的对话-多学科-对话评估的对话：2006年国际语音演讲卫星事件的报告
4. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation [C] . Chia-Wei Liu, Ryan Lowe, Iulian V. Serban, Conference on empirical methods in natural language processing . 2016

机译：如何不评估您的对话系统：对话响应生成的无监督评估指标的实证研究
5. Rapid prototyping and evaluation of dialogue systems for virtual humans [D] . Gandhe, Sudeep 2014

机译：虚拟人对话系统的快速原型制作和评估
6. Evaluating a Spoken Dialogue System for Recording Systems of Nursing Care [O] . Tittaya Mairittha, Nattaya Mairittha, Sozo Inoue 2019

机译：评估护理记录系统的口语对话系统
7. Evaluating Dialogue Generation Systems via Response Selection [O] . Shiki Sato, Reina Akama, Hiroki Ouchi, 2020

机译：通过响应选择评估对话生成系统
8. DATE: A Dialogue Act Tagging Scheme for Evaluation of Spoken Dialogue Systems [R] . Walker, M. , Passonneau, R. 2001

机译：日期：用于评估口头对话系统的对话法标签计划

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅